Global Terrorism Analysis

Introduction

This analysis delves into global terrorism trends, exploring how terrorist activities have evolved over time and identifying regions with significant deviations from global patterns. By examining attack success rates, prevalent tactics, and regional variations, we aim to uncover key insights into the nature of terrorist incidents worldwide. This exploration utilizes interactive plots and geographic visualizations to enhance understanding and engagement.


About the Dataset

The dataset, sourced from the Global Terrorism Database (GTD), provides comprehensive data on over 180,000 terrorist attacks from 1970 to 2017. Managed by the National Consortium for the Study of Terrorism and Responses to Terrorism (START), this open-source repository offers detailed information on both domestic and international incidents, enabling a thorough examination of global terrorism trends.


Key Features:

Project steps

First

second

third


After download data on local machine check what is encoding using to encode data for this using ( chardet ) to define type data to avoid down UnicodeDecodeError.

- Load and Explore the dataset

Cheat Sheet showing the missing value & some other features!!

#df= df.dropna(thresh=len(df)*.6,axis=1)
numeric_cols = df.select_dtypes(include=[np.number]) # Calculating mean, median, and standard deviation mean_values = numeric_cols.mean() median_values = numeric_cols.median() std_dev_values = numeric_cols.std()mean_values # dask import dask.dataframe as dd dtype={'approxdate': 'object', 'attacktype2_txt': 'object', 'attacktype3_txt': 'object', 'claimmode2_txt': 'object', 'claimmode3_txt': 'object', 'corp2': 'object', 'corp3': 'object', 'divert': 'object', 'gname2': 'object', 'gname3': 'object', 'gsubname': 'object', 'gsubname2': 'object', 'gsubname3': 'object', 'guncertain1': 'float64', 'hostkidoutcome_txt': 'object', 'ishostkid': 'float64', 'natlty1': 'float64', 'natlty2_txt': 'object', 'natlty3_txt': 'object', 'ransom': 'float64', 'ransomnote': 'object', 'related': 'object', 'resolution': 'object', 'specificity': 'float64', 'target2': 'object', 'target3': 'object', 'targsubtype1': 'float64', 'targsubtype2_txt': 'object', 'targsubtype3_txt': 'object', 'targtype2_txt': 'object', 'targtype3_txt': 'object', 'weapsubtype2_txt': 'object', 'weapsubtype3_txt': 'object', 'weapsubtype4_txt': 'object', 'weaptype2_txt': 'object', 'weaptype3_txt': 'object', 'weaptype4_txt': 'object'} start_time = time.time() # Read the CSV file into a Dask DataFrame df_dd = dd.read_csv('globalterrorismdb_0718dist.csv',dtype=dtype, encoding='ISO-8859-1',low_memory=False) dask_duration = time.time() - start_time dask_duration df_dd.head()

- Numerical Features

- Categorical Features

Time to analysis!


Given the extensive number of columns in the dataset, we'll focus on selecting only the key columns for data preprocessing to ensure a more efficient and manageable analysis. By concentrating on the most relevant columns, we can streamline our efforts and derive meaningful insights from the dataset.


# Identify the most frequent values in categorical columns. cate_cols = df_terr.select_dtypes(include=['object'])

> We conclude from here that most terrorist attacks are concentrated in2014.

Plot showing Terrorist Activities Each Year


Total active terrorist attacks on region.



Sheet showing the number of terrorism attacks per region each year.


Total terrorist strikes per country


From the graph we can see The Most 5 Targered Affected Country with Terrorism Attacks are:

  1. Iraq
  2. Pakistan
  3. Afghanistan
  4. India
  5. Colombia

sheet showing the number of terrorism attacks per country each year.


Total Cacasualties & Killed & Wounded each Country under Region







The type of attack and its impact on the number of casualties killed and wounded.


# to get on overall casualties killing and wounding. df_terr['casualties'] = df_terr['Killed'].fillna(0) + df_terr['Wounded'].fillna(0)


Now let us check out which Terrorist organizations have carried out their operations in each country.



By looking at the data and performing some mathematical procedures, we find that unknown data represents 45% of the total data that we have, and we will work to solve it.


Looking at the unknown data, it represents a 3.24% percentage overall data that we can ignore without effect on data.

# Pillow # imagemagick # !apt-get install imagemagick # !conda install -c conda-forge imagemagick

Animation shows the spread of terrorist activities in the country over the past years.


Distribution showing the total terrorist attacks by country during the past years and the most affected countries.


--------------------> Performance Comparison with Dask <--------------------------

!python -m pip install "dask[dataframe]" --upgrade!pip install --upgrade dask

Summary !!


[2]Compare the performance and memory usage of Dask operations with Pandas.

  1. Setup and Create a Large Synthetic Dataset.
import dask.dataframe as dd import time # Measure performance with Dask # Load data start_time = time.time() dump_df_dask = dd.read_csv('lrg_dataset.csv') load_time_dask = time.time() - start_time # Group by 'folders' and compute sum of 'files' start_time = time.time() df_grouped_dask = dump_df_dask.groupby('folders')['files'].sum().compute() groupby_time_dask = time.time() - start_time # Print performance results print(f"Dask Load Time: {load_time_dask:.2f} seconds") print(f"Dask Groupby Time: {groupby_time_dask:.2f} seconds") # Estimate memory usage by converting a sample to Pandas sample_size = 10000000 # Number of rows to sample sample = dump_df_dask.head(sample_size) sample_memory_usage = sample.memory_usage(deep=True).sum() estimated_memory_usage = (sample_memory_usage / sample_size) * len(dump_df_dask) # Estimate memory usage for the grouped data grouped_sample_memory_usage = df_grouped_dask.memory_usage(deep=True).sum() if isinstance(df_grouped_dask, pd.DataFrame) else df_grouped_dask.nbytes estimated_grouped_memory_usage = grouped_sample_memory_usage # Print memory usage results print(f"Estimated Dask Memory Usage: {estimated_memory_usage / 1e6:.2f} MB") print(f"Estimated Dask Grouped Data Memory Usage: {estimated_grouped_memory_usage / 1e6:.2f} MB") import pandas as pd import time # Measure performance and memory usage with Pandas # Load data start_time = time.time() dump_df_pandas = pd.read_csv('lrg_dataset.csv') load_time_pandas = time.time() - start_time # Group by 'Country' and compute sum of 'Value' start_time = time.time() df_grouped_pandas = dump_df_pandas.groupby('folders')['files'].sum().reset_index() groupby_time_pandas = time.time() - start_time # Print results print(f"Pandas Load Time: {load_time_pandas:.2f} seconds") print(f"Pandas Groupby Time: {groupby_time_pandas:.2f} seconds") # Memory usage print(f"Pandas Memory Usage: {dump_df_pandas.memory_usage(deep=True).sum() / 1e6:.2f} MB") print(f"Pandas Grouped Data Memory Usage: {df_grouped_pandas.memory_usage(deep=True).sum() / 1e6:.2f} MB")


The End :)